Python EDA of AirBnB 2024

In [30]:
#importing modules
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
In [31]:
data = pd.read_csv("datasets.csv", encoding_errors="ignore")
In [32]:
data.head()
Out[32]:
id name host_id host_name neighbourhood_group neighbourhood latitude longitude room_type price ... last_review reviews_per_month calculated_host_listings_count availability_365 number_of_reviews_ltm license rating bedrooms beds baths
0 1.312228e+06 Rental unit in Brooklyn · ★5.0 · 1 bedroom 7130382 Walter Brooklyn Clinton Hill 40.683710 -73.964610 Private room 55.0 ... 20/12/15 0.03 1.0 0.0 0.0 No License 5 1 1 Not specified
1 4.527754e+07 Rental unit in New York · ★4.67 · 2 bedrooms ·... 51501835 Jeniffer Manhattan Hell's Kitchen 40.766610 -73.988100 Entire home/apt 144.0 ... 01/05/23 0.24 139.0 364.0 2.0 No License 4.67 2 1 1
2 9.710000e+17 Rental unit in New York · ★4.17 · 1 bedroom · ... 528871354 Joshua Manhattan Chelsea 40.750764 -73.994605 Entire home/apt 187.0 ... 18/12/23 1.67 1.0 343.0 6.0 Exempt 4.17 1 2 1
3 3.857863e+06 Rental unit in New York · ★4.64 · 1 bedroom · ... 19902271 John And Catherine Manhattan Washington Heights 40.835600 -73.942500 Private room 120.0 ... 17/09/23 1.38 2.0 363.0 12.0 No License 4.64 1 1 1
4 4.089661e+07 Condo in New York · ★4.91 · Studio · 1 bed · 1... 61391963 Stay With Vibe Manhattan Murray Hill 40.751120 -73.978600 Entire home/apt 85.0 ... 03/12/23 0.24 133.0 335.0 3.0 No License 4.91 Studio 1 1

5 rows × 22 columns

In [33]:
data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20770 entries, 0 to 20769
Data columns (total 22 columns):
 #   Column                          Non-Null Count  Dtype  
---  ------                          --------------  -----  
 0   id                              20770 non-null  float64
 1   name                            20770 non-null  object 
 2   host_id                         20770 non-null  int64  
 3   host_name                       20770 non-null  object 
 4   neighbourhood_group             20770 non-null  object 
 5   neighbourhood                   20763 non-null  object 
 6   latitude                        20763 non-null  float64
 7   longitude                       20763 non-null  float64
 8   room_type                       20763 non-null  object 
 9   price                           20736 non-null  float64
 10  minimum_nights                  20763 non-null  float64
 11  number_of_reviews               20763 non-null  float64
 12  last_review                     20763 non-null  object 
 13  reviews_per_month               20763 non-null  float64
 14  calculated_host_listings_count  20763 non-null  float64
 15  availability_365                20763 non-null  float64
 16  number_of_reviews_ltm           20763 non-null  float64
 17  license                         20770 non-null  object 
 18  rating                          20770 non-null  object 
 19  bedrooms                        20770 non-null  object 
 20  beds                            20770 non-null  int64  
 21  baths                           20770 non-null  object 
dtypes: float64(10), int64(2), object(10)
memory usage: 3.5+ MB
In [34]:
data.shape
Out[34]:
(20770, 22)
In [35]:
data.describe()
Out[35]:
id host_id latitude longitude price minimum_nights number_of_reviews reviews_per_month calculated_host_listings_count availability_365 number_of_reviews_ltm beds
count 2.077000e+04 2.077000e+04 20763.000000 20763.000000 20736.000000 20763.000000 20763.000000 20763.000000 20763.000000 20763.000000 20763.000000 20770.000000
mean 3.033858e+17 1.749049e+08 40.726821 -73.939179 187.714940 28.558493 42.610605 1.257589 18.866686 206.067957 10.848962 1.723592
std 3.901221e+17 1.725657e+08 0.060293 0.061403 1023.245124 33.532697 73.523401 1.904472 70.921443 135.077259 21.354876 1.211993
min 2.595000e+03 1.678000e+03 40.500314 -74.249840 10.000000 1.000000 1.000000 0.010000 1.000000 0.000000 0.000000 1.000000
25% 2.707260e+07 2.041184e+07 40.684159 -73.980755 80.000000 30.000000 4.000000 0.210000 1.000000 87.000000 1.000000 1.000000
50% 4.992852e+07 1.086990e+08 40.722890 -73.949597 125.000000 30.000000 14.000000 0.650000 2.000000 215.000000 3.000000 1.000000
75% 7.220000e+17 3.143997e+08 40.763106 -73.917475 199.000000 30.000000 49.000000 1.800000 5.000000 353.000000 15.000000 2.000000
max 1.050000e+18 5.504035e+08 40.911147 -73.713650 100000.000000 1250.000000 1865.000000 75.490000 713.000000 365.000000 1075.000000 42.000000
In [36]:
#Data cleaning
data.isnull().sum()
Out[36]:
id                                 0
name                               0
host_id                            0
host_name                          0
neighbourhood_group                0
neighbourhood                      7
latitude                           7
longitude                          7
room_type                          7
price                             34
minimum_nights                     7
number_of_reviews                  7
last_review                        7
reviews_per_month                  7
calculated_host_listings_count     7
availability_365                   7
number_of_reviews_ltm              7
license                            0
rating                             0
bedrooms                           0
beds                               0
baths                              0
dtype: int64
In [37]:
data[pd.isnull(data['price'])].head(2)
Out[37]:
id name host_id host_name neighbourhood_group neighbourhood latitude longitude room_type price ... last_review reviews_per_month calculated_host_listings_count availability_365 number_of_reviews_ltm license rating bedrooms beds baths
15913 47047419.0 Guest suite in New York · ★4.88 · 8 bedrooms ·... 380290574 Karen Manhattan NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN OSE-STRREG-0000018 4.88 8 9 7
15914 38906302.0 Loft in Brooklyn · ★4.71 · 1 bedroom · 1 bed ·... 51954926 K Brooklyn NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN No License 4.71 1 1 2

2 rows × 22 columns

In [38]:
data['price'].fillna(10, inplace=True)
C:\Users\welcome\AppData\Local\Temp\ipykernel_8952\3963316189.py:1: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  data['price'].fillna(10, inplace=True)
In [39]:
#dropping All missing raws and column
data.dropna(inplace=True)
data.isnull().sum()
Out[39]:
id                                0
name                              0
host_id                           0
host_name                         0
neighbourhood_group               0
neighbourhood                     0
latitude                          0
longitude                         0
room_type                         0
price                             0
minimum_nights                    0
number_of_reviews                 0
last_review                       0
reviews_per_month                 0
calculated_host_listings_count    0
availability_365                  0
number_of_reviews_ltm             0
license                           0
rating                            0
bedrooms                          0
beds                              0
baths                             0
dtype: int64
In [40]:
data.shape
Out[40]:
(20763, 22)
In [41]:
#dealing with dublicvate
data.duplicated().sum()
Out[41]:
12
In [42]:
data[data.duplicated()]
Out[42]:
id name host_id host_name neighbourhood_group neighbourhood latitude longitude room_type price ... last_review reviews_per_month calculated_host_listings_count availability_365 number_of_reviews_ltm license rating bedrooms beds baths
6 4.527754e+07 Rental unit in New York · ★4.67 · 2 bedrooms ·... 51501835 Jeniffer Manhattan Hell's Kitchen 40.766610 -73.988100 Entire home/apt 144.0 ... 01/05/23 0.24 139.0 364.0 2.0 No License 4.67 2 1 1
7 9.710000e+17 Rental unit in New York · ★4.17 · 1 bedroom · ... 528871354 Joshua Manhattan Chelsea 40.750764 -73.994605 Entire home/apt 187.0 ... 18/12/23 1.67 1.0 343.0 6.0 Exempt 4.17 1 2 1
8 3.857863e+06 Rental unit in New York · ★4.64 · 1 bedroom · ... 19902271 John And Catherine Manhattan Washington Heights 40.835600 -73.942500 Private room 120.0 ... 17/09/23 1.38 2.0 363.0 12.0 No License 4.64 1 1 1
9 4.089661e+07 Condo in New York · ★4.91 · Studio · 1 bed · 1... 61391963 Stay With Vibe Manhattan Murray Hill 40.751120 -73.978600 Entire home/apt 85.0 ... 03/12/23 0.24 133.0 335.0 3.0 No License 4.91 Studio 1 1
10 4.958498e+07 Rental unit in New York · ★5.0 · 1 bedroom · 1... 51501835 Jeniffer Manhattan Hell's Kitchen 40.759950 -73.992960 Entire home/apt 115.0 ... 29/07/23 0.16 139.0 276.0 2.0 No License 5 1 1 1
20736 7.990000e+17 Rental unit in New York · 2 bedrooms · 2 beds ... 224733902 CozySuites Copake Manhattan Upper East Side 40.768970 -73.957592 Entire home/apt 153.0 ... 15/09/23 0.41 8.0 308.0 2.0 No License No rating 2 2 2
20737 5.930000e+17 Rental unit in New York · ★4.79 · 2 bedrooms ·... 23219783 Rob Manhattan West Village 40.730220 -74.002910 Entire home/apt 175.0 ... 22/11/23 2.03 4.0 129.0 25.0 No License 4.79 2 2 1
20738 9.230000e+17 Loft in New York · ★4.33 · 1 bedroom · 2 beds ... 520265731 Rodrigo Manhattan Greenwich Village 40.728390 -73.999540 Entire home/apt 156.0 ... 02/01/24 2.60 1.0 356.0 9.0 Exempt 4.33 1 2 1
20739 1.336161e+07 Rental unit in New York · ★4.89 · 2 bedrooms ·... 8961407 Jamie Manhattan Harlem 40.805700 -73.946250 Entire home/apt 397.0 ... 08/09/23 1.08 3.0 274.0 3.0 No License 4.89 2 2 1
20740 5.119566e+07 Rental unit in New York · Studio · 1 bed · 1 bath 51501835 Jeniffer Manhattan Chinatown 40.718360 -73.995850 Entire home/apt 100.0 ... 25/05/23 0.08 139.0 306.0 1.0 No License No rating Studio 1 1
20741 2.523473e+07 Rental unit in New York · ★4.41 · 1 bedroom · ... 1497427 Mara Manhattan Upper East Side 40.774030 -73.950580 Entire home/apt 120.0 ... 31/03/23 0.26 2.0 364.0 1.0 No License 4.41 1 2 1
20742 3.339399e+06 Rental unit in New York · ★4.73 · 1 bedroom · ... 2119276 Urban Furnished Manhattan West Village 40.732030 -74.006760 Entire home/apt 143.0 ... 09/05/23 0.20 54.0 285.0 2.0 No License 4.73 1 1 1

12 rows × 22 columns

In [43]:
data.drop_duplicates(inplace=True)
In [44]:
data.duplicated().sum()
Out[44]:
0

EDA

In [45]:
#price distribution
sns.histplot(data=data, x ="price")
Out[45]:
<Axes: xlabel='price', ylabel='Count'>
No description has been provided for this image
In [46]:
#dealing with outliers
sns.boxenplot(data=data, x = "price")
Out[46]:
<Axes: xlabel='price'>
No description has been provided for this image
In [47]:
Q1 = data['price'].quantile(0.25)
Q3 = data['price'].quantile(0.75)
IQR = Q3 - Q1

lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR

outliers = data[(data['price'] < lower_bound) | (data['price'] > upper_bound)]
print(outliers)
                 id                                               name  \
20     2.992360e+07  Serviced apartment in New York City · Studio ·...   
22     3.471850e+07  Condo in New York · 2 bedrooms · 2 beds · 2 baths   
52     1.027808e+06  Loft in New York · ★4.87 · 2 bedrooms · 1 bed ...   
54     1.146653e+06  Condo in New York · ★4.90 · 2 bedrooms · 1 bed...   
69     8.070000e+17  Home in Queens · ★4.50 · 6 bedrooms · 8 beds ·...   
...             ...                                                ...   
20669  4.023946e+07  Townhouse in Brooklyn · ★4.50 · 2 bedrooms · 2...   
20687  3.100156e+07  Boutique hotel in New York · ★5.0 · 1 bedroom ...   
20703  2.720010e+07  Rental unit in Brooklyn · ★4.84 · 2 bedrooms ·...   
20723  4.550539e+07  Rental unit in New York · ★5.0 · 2 bedrooms · ...   
20732  1.336161e+07  Rental unit in New York · ★4.89 · 2 bedrooms ·...   

         host_id          host_name neighbourhood_group       neighbourhood  \
20     220229838      Chamber Hotel           Manhattan             Midtown   
22       6674394                Eri           Manhattan     Upper West Side   
52       5655889  East Village Loft           Manhattan        East Village   
54        836168              Henry           Manhattan     Upper West Side   
69     496749638                  B              Queens    South Ozone Park   
...          ...                ...                 ...                 ...   
20669    3591955            Charles            Brooklyn  Bedford-Stuyvesant   
20687   26556695  Justin And Alyssa           Manhattan             Midtown   
20703     416361                  O            Brooklyn  Bedford-Stuyvesant   
20723   25596933               Anna           Manhattan             Tribeca   
20732    8961407              Jamie           Manhattan              Harlem   

        latitude  longitude        room_type   price  ...  last_review  \
20     40.761760 -73.976590       Hotel room  1000.0  ...     02/07/19   
22     40.792830 -73.971900  Entire home/apt   425.0  ...     25/06/19   
52     40.721540 -73.981740  Entire home/apt   450.0  ...     05/11/23   
54     40.792960 -73.964990  Entire home/apt  1000.0  ...     27/01/16   
69     40.664932 -73.815531  Entire home/apt   900.0  ...     13/10/23   
...          ...        ...              ...     ...  ...          ...   
20669  40.685340 -73.934620  Entire home/apt   400.0  ...     17/04/23   
20687  40.762170 -73.973490     Private room  1150.0  ...     30/09/23   
20703  40.680390 -73.943680  Entire home/apt   395.0  ...     26/07/23   
20723  40.720640 -74.009350  Entire home/apt   700.0  ...     22/11/22   
20732  40.805700 -73.946250  Entire home/apt   397.0  ...     08/09/23   

       reviews_per_month calculated_host_listings_count  availability_365  \
20                  0.02                           10.0             363.0   
22                  0.02                            1.0             358.0   
52                  0.42                            1.0             355.0   
54                  0.18                           10.0             364.0   
69                  1.40                            2.0             365.0   
...                  ...                            ...               ...   
20669               0.14                            1.0              83.0   
20687               0.09                            6.0             365.0   
20703               0.65                            1.0             178.0   
20723               0.10                            1.0             155.0   
20732               1.08                            3.0             274.0   

       number_of_reviews_ltm     license     rating bedrooms beds baths  
20                       0.0      Exempt  No rating   Studio    2     1  
22                       0.0  No License  No rating        2    2     2  
52                       2.0  No License       4.87        2    1     2  
54                       0.0  No License        4.9        2    1     1  
69                      14.0  No License        4.5        6    8     4  
...                      ...         ...        ...      ...  ...   ...  
20669                    1.0  No License        4.5        2    2     2  
20687                    2.0      Exempt          5        1    1     1  
20703                    4.0  No License       4.84        2    2   1.5  
20723                    0.0  No License          5        2    2     2  
20732                    3.0  No License       4.89        2    2     1  

[1382 rows x 22 columns]
In [48]:
data = data[(data['price'] >= lower_bound) & (data['price'] <= upper_bound)]
In [49]:
sns.histplot(data=data, x ="price")
Out[49]:
<Axes: xlabel='price', ylabel='Count'>
No description has been provided for this image
In [50]:
sns.boxenplot(data=data, x = "price")
Out[50]:
<Axes: xlabel='price'>
No description has been provided for this image
In [51]:
data.groupby(by="neighbourhood_group")["price"].mean()
Out[51]:
neighbourhood_group
Bronx            102.182013
Brooklyn         133.538136
Manhattan        155.673090
Queens           112.401144
Staten Island    110.576923
Name: price, dtype: float64
In [52]:
data["price per bed"]=data["price"]/data["beds"]
data.head()
Out[52]:
id name host_id host_name neighbourhood_group neighbourhood latitude longitude room_type price ... reviews_per_month calculated_host_listings_count availability_365 number_of_reviews_ltm license rating bedrooms beds baths price per bed
0 1.312228e+06 Rental unit in Brooklyn · ★5.0 · 1 bedroom 7130382 Walter Brooklyn Clinton Hill 40.683710 -73.964610 Private room 55.0 ... 0.03 1.0 0.0 0.0 No License 5 1 1 Not specified 55.0
1 4.527754e+07 Rental unit in New York · ★4.67 · 2 bedrooms ·... 51501835 Jeniffer Manhattan Hell's Kitchen 40.766610 -73.988100 Entire home/apt 144.0 ... 0.24 139.0 364.0 2.0 No License 4.67 2 1 1 144.0
2 9.710000e+17 Rental unit in New York · ★4.17 · 1 bedroom · ... 528871354 Joshua Manhattan Chelsea 40.750764 -73.994605 Entire home/apt 187.0 ... 1.67 1.0 343.0 6.0 Exempt 4.17 1 2 1 93.5
3 3.857863e+06 Rental unit in New York · ★4.64 · 1 bedroom · ... 19902271 John And Catherine Manhattan Washington Heights 40.835600 -73.942500 Private room 120.0 ... 1.38 2.0 363.0 12.0 No License 4.64 1 1 1 120.0
4 4.089661e+07 Condo in New York · ★4.91 · Studio · 1 bed · 1... 61391963 Stay With Vibe Manhattan Murray Hill 40.751120 -73.978600 Entire home/apt 85.0 ... 0.24 133.0 335.0 3.0 No License 4.91 Studio 1 1 85.0

5 rows × 23 columns

In [53]:
data.groupby(by="neighbourhood_group")["price per bed"].mean()
Out[53]:
neighbourhood_group
Bronx             72.611098
Brooklyn          92.446518
Manhattan        117.843694
Queens            74.353753
Staten Island     67.112188
Name: price per bed, dtype: float64
In [54]:
data.columns
Out[54]:
Index(['id', 'name', 'host_id', 'host_name', 'neighbourhood_group',
       'neighbourhood', 'latitude', 'longitude', 'room_type', 'price',
       'minimum_nights', 'number_of_reviews', 'last_review',
       'reviews_per_month', 'calculated_host_listings_count',
       'availability_365', 'number_of_reviews_ltm', 'license', 'rating',
       'bedrooms', 'beds', 'baths', 'price per bed'],
      dtype='object')
In [55]:
sns.barplot(data=data, x = "neighbourhood_group", y = "price", hue="room_type")
Out[55]:
<Axes: xlabel='neighbourhood_group', ylabel='price'>
No description has been provided for this image
In [56]:
#number of reviews and price dependency
sns.scatterplot(data=data, x = "number_of_reviews", y = "price", hue="neighbourhood_group")
Out[56]:
<Axes: xlabel='number_of_reviews', ylabel='price'>
No description has been provided for this image
In [57]:
sns.pairplot(data=data, vars=["price", 'availability_365', 'price per bed'], hue="neighbourhood_group")
Out[57]:
<seaborn.axisgrid.PairGrid at 0x2082e582f20>
No description has been provided for this image
In [ ]:
data['last_review'] = pd.to_datetime(data['last_review'], errors='coerce')

data['review_month'] = data['last_review'].dt.month

monthly_price_trends = data.groupby('review_month')['price'].mean()

plt.figure(figsize=(10, 6))
monthly_price_trends.plot(kind='line', marker='o', color='blue')
plt.title('Average Price Across Months', fontsize=16)
plt.xlabel('Month', fontsize=14)
plt.ylabel('Average Price', fontsize=14)
plt.xticks(ticks=range(1, 13), labels=[
    'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 
    'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'
], fontsize=12)
plt.grid(visible=True, linestyle='--', alpha=0.7)
plt.tight_layout()
plt.show()
C:\Users\welcome\AppData\Local\Temp\ipykernel_8952\711206002.py:2: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  data['last_review'] = pd.to_datetime(data['last_review'], errors='coerce')
No description has been provided for this image
In [59]:
# Create a new column categorizing hosts by the number of listings
data['listing_category'] = data['calculated_host_listings_count'].apply(
    lambda x: 'Single Listing' if x == 1 else 'Multiple Listings'
)

# Calculate average price for single and multiple listing hosts
price_by_listing_category = data.groupby('listing_category')['price'].mean()

import seaborn as sns
import matplotlib.pyplot as plt

plt.figure(figsize=(8, 6))
sns.barplot(x=price_by_listing_category.index, y=price_by_listing_category.values, palette='viridis')
plt.title('Average Price: Single vs. Multiple Listings', fontsize=16)
plt.xlabel('Host Listing Category', fontsize=14)
plt.ylabel('Average Price', fontsize=14)
plt.grid(visible=True, linestyle='--', alpha=0.7)
plt.tight_layout()
plt.show()
C:\Users\welcome\AppData\Local\Temp\ipykernel_8952\3278957082.py:13: FutureWarning: 

Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `x` variable to `hue` and set `legend=False` for the same effect.

  sns.barplot(x=price_by_listing_category.index, y=price_by_listing_category.values, palette='viridis')
No description has been provided for this image
In [61]:
!pip install folium
Collecting folium
  Downloading folium-0.18.0-py2.py3-none-any.whl (108 kB)
     ------------------------------------ 108.9/108.9 KB 524.7 kB/s eta 0:00:00
Requirement already satisfied: numpy in c:\users\welcome\appdata\local\programs\python\python310\lib\site-packages (from folium) (1.26.4)
Collecting branca>=0.6.0
  Downloading branca-0.8.0-py3-none-any.whl (25 kB)
Collecting xyzservices
  Downloading xyzservices-2024.9.0-py3-none-any.whl (85 kB)
     ---------------------------------------- 85.1/85.1 KB 1.6 MB/s eta 0:00:00
Requirement already satisfied: requests in c:\users\welcome\appdata\local\programs\python\python310\lib\site-packages (from folium) (2.31.0)
Requirement already satisfied: jinja2>=2.9 in c:\users\welcome\appdata\local\programs\python\python310\lib\site-packages (from folium) (3.1.4)
Requirement already satisfied: MarkupSafe>=2.0 in c:\users\welcome\appdata\local\programs\python\python310\lib\site-packages (from jinja2>=2.9->folium) (2.1.5)
Requirement already satisfied: certifi>=2017.4.17 in c:\users\welcome\appdata\local\programs\python\python310\lib\site-packages (from requests->folium) (2024.2.2)
Requirement already satisfied: charset-normalizer<4,>=2 in c:\users\welcome\appdata\local\programs\python\python310\lib\site-packages (from requests->folium) (3.3.2)
Requirement already satisfied: urllib3<3,>=1.21.1 in c:\users\welcome\appdata\local\programs\python\python310\lib\site-packages (from requests->folium) (2.2.1)
Requirement already satisfied: idna<4,>=2.5 in c:\users\welcome\appdata\local\programs\python\python310\lib\site-packages (from requests->folium) (3.6)
Installing collected packages: xyzservices, branca, folium
Successfully installed branca-0.8.0 folium-0.18.0 xyzservices-2024.9.0
WARNING: You are using pip version 22.0.4; however, version 24.3.1 is available.
You should consider upgrading via the 'C:\Users\welcome\AppData\Local\Programs\Python\Python310\python.exe -m pip install --upgrade pip' command.
In [62]:
import folium
from folium.plugins import MarkerCluster

# Create a base map centered around the average latitude and longitude
center_lat, center_lon = data['latitude'].mean(), data['longitude'].mean()
map_airbnb = folium.Map(location=[center_lat, center_lon], zoom_start=12)

# Add a MarkerCluster layer to group close points together
marker_cluster = MarkerCluster().add_to(map_airbnb)

# Iterate through the data and add markers to the map
for _, row in data.iterrows():
    folium.Marker(
        location=[row['latitude'], row['longitude']],
        popup=f"Price: ${row['price']}<br>Room Type: {row['room_type']}<br>Neighbourhood: {row['neighbourhood']}",
        tooltip=row['name']
    ).add_to(marker_cluster)

# Display the map
map_airbnb
Out[62]:
Make this Notebook Trusted to load map: File -> Trust Notebook
In [63]:
# Define categories based on price
def customer_segment(row):
    if row['price'] < 100:
        return 'Budget-Conscious'
    elif 100 <= row['price'] <= 300:
        return 'Mid-Range'
    else:
        return 'Luxury-Seeking'

# Apply the function to create a new column
data['customer_segment'] = data.apply(customer_segment, axis=1)

# Analyze customer segments based on reviews and neighborhoods
segmentation_summary = data.groupby('customer_segment').agg({
    'price': 'mean',
    'number_of_reviews': 'mean',
    'neighbourhood_group': lambda x: x.mode()[0]  # Most common neighborhood group
}).reset_index()

print(segmentation_summary)

# Visualize the segmentation
plt.figure(figsize=(10, 6))
sns.boxplot(data=data, x='customer_segment', y='price', palette='viridis')
plt.title('Price Distribution Across Customer Segments', fontsize=16)
plt.xlabel('Customer Segment', fontsize=14)
plt.ylabel('Price', fontsize=14)
plt.grid(visible=True, linestyle='--', alpha=0.7)
plt.tight_layout()
plt.show()
   customer_segment       price  number_of_reviews neighbourhood_group
0  Budget-Conscious   66.539350          41.806987            Brooklyn
1    Luxury-Seeking  337.304636          30.572185           Manhattan
2         Mid-Range  167.659678          45.667140           Manhattan
C:\Users\welcome\AppData\Local\Temp\ipykernel_8952\676959346.py:24: FutureWarning: 

Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `x` variable to `hue` and set `legend=False` for the same effect.

  sns.boxplot(data=data, x='customer_segment', y='price', palette='viridis')
No description has been provided for this image
In [65]:
# Create a column categorizing hosts based on reviews
def host_reputation(row):
    if row['number_of_reviews'] > 50:
        return 'Highly Reviewed'
    elif 10 <= row['number_of_reviews'] <= 50:
        return 'Moderately Reviewed'
    else:
        return 'Lowly Reviewed'

data['host_reputation'] = data.apply(host_reputation, axis=1)

# Analyze host reputation
host_reputation_summary = data.groupby('host_reputation').agg({
    'number_of_reviews': 'mean',
    'reviews_per_month': 'mean',
    'price': 'mean'
}).reset_index()

print(host_reputation_summary)

# Visualize host reputation
plt.figure(figsize=(10, 6))
sns.barplot(data=host_reputation_summary, x='host_reputation', y='price', palette='coolwarm')
plt.title('Average Price Across Host Reputation Categories', fontsize=16)
plt.xlabel('Host Reputation', fontsize=14)
plt.ylabel('Average Price', fontsize=14)
plt.grid(visible=True, linestyle='--', alpha=0.7)
plt.tight_layout()
plt.show()
       host_reputation  number_of_reviews  reviews_per_month       price
0      Highly Reviewed         134.388100           2.693772  134.994853
1       Lowly Reviewed           3.674211           0.435785  136.414309
2  Moderately Reviewed          24.818875           1.243042  135.838847
C:\Users\welcome\AppData\Local\Temp\ipykernel_8952\4196964096.py:23: FutureWarning: 

Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `x` variable to `hue` and set `legend=False` for the same effect.

  sns.barplot(data=host_reputation_summary, x='host_reputation', y='price', palette='coolwarm')
No description has been provided for this image

Conclusion for Airbnb Dataset Analysis

The exploratory data analysis of the Airbnb dataset provided valuable insights into pricing, customer behavior, host activity, and geographical distribution. Below are the key takeaways:

1. Data Cleaning and Preparation

  • Handled missing values by imputing or dropping them, ensuring a clean dataset for analysis.
  • Removed duplicate rows and addressed outliers in the price column using the Interquartile Range (IQR) method.

2. Price Analysis

  • General Trends: The average price across listings was approximately $188. Manhattan had the highest average price, while the Bronx and Staten Island offered more affordable options.
  • Room Type Comparison: Entire homes/apartments had significantly higher prices than private rooms, shared rooms, or hotel rooms.
  • Price Per Bed: The cost per bed highlighted differences in pricing strategies across neighborhoods.

3. Seasonal Trends

  • Monthly Price Variation: Average prices varied slightly across months. However, there was no clear seasonal pattern, suggesting pricing may not be heavily influenced by time of year or seasonality.

4. Host Behavior

  • Single vs. Multiple Listings: Hosts with multiple listings tended to charge higher average prices than single-listing hosts, indicating potential pricing strategies for professional hosts.
  • Host Reputation: Highly reviewed hosts had slightly higher average prices, reflecting the added value of reputation in the Airbnb ecosystem.

5. Customer Segmentation

  • Segmented customers into Budget-Conscious, Mid-Range, and Luxury-Seeking based on price:
    • Budget-Conscious travelers dominated in the Bronx and Staten Island.
    • Luxury-Seeking travelers were concentrated in Manhattan.
    • Mid-Range listings were spread across Brooklyn and Queens.
  • Price and review trends further differentiated customer preferences.

6. Geospatial Analysis

  • Cluster Visualization: Interactive maps highlighted dense clusters of listings in Manhattan and Brooklyn.
  • Demand Areas: High-demand neighborhoods like Chelsea and Hell's Kitchen in Manhattan had a larger concentration of expensive listings, while Brooklyn offered diverse options at varying price points.

Key Recommendations

  1. For Hosts:

    • Price Optimization: Hosts in high-demand areas can capitalize on higher pricing for entire homes/apartments or well-reviewed listings.
    • Reputation Building: Encouraging reviews and maintaining a high rating can justify premium pricing.
  2. For Customers:

    • Budget-Friendly Options: Travelers seeking affordable options should explore Staten Island and the Bronx.
    • Luxury Experiences: Manhattan is ideal for high-end experiences, particularly in iconic neighborhoods.
  3. Platform Improvements:

    • Dynamic Pricing Tools: Introduce dynamic pricing tools to help hosts optimize revenue based on seasonal and regional demand.
    • Targeted Recommendations: Use customer segmentation insights to recommend listings tailored to user preferences.